Skip to content

Add vif: OCaml 5 web framework with Miou multicore scheduler#85

Open
BennyFranciscus wants to merge 5 commits intoMDA2AV:mainfrom
BennyFranciscus:add-vif
Open

Add vif: OCaml 5 web framework with Miou multicore scheduler#85
BennyFranciscus wants to merge 5 commits intoMDA2AV:mainfrom
BennyFranciscus:add-vif

Conversation

@BennyFranciscus
Copy link
Copy Markdown
Collaborator

Replaces #84 (closed due to branch protection blocking fixes).

What

Adds vif — an OCaml 5 web framework built on the Miou multicore scheduler, replacing the Dream entry per discussion on PR #25.

Stack

  • vif 0.0.1~beta2 + httpcats (HTTP/1.1 + HTTP/2)
  • Miou multicore scheduler (OCaml 5 domains/effects)
  • Yojson for JSON serialization
  • sqlite3-ocaml for database queries
  • decompress (pure OCaml gzip via vif's built-in ~compression:\Gzip`)

Changes from #84

  • Fixed type error: Uri.any for query wildcards vs Type.any for POST content-type matching
  • Used nil for routes that don't accept query parameters (pipeline, json, compression, upload)

Endpoints

All 8 standard HttpArena endpoints implemented:

  • /pipeline — simple text response
  • /baseline11 GET/POST — query param sum + body
  • /baseline2 — query param sum
  • /json — dataset processing
  • /compression — gzip compressed response
  • /upload — streaming body read with byte counting
  • /db — SQLite query with parameterized range

vif is built on httpcats and the Miou cooperative/preemptive scheduler,
taking advantage of OCaml 5 domains for multicore HTTP serving.

Key highlights:
- Pure OCaml stack (TLS, crypto, compression all in OCaml)
- Typed routing checked at compile time
- httpcats engine for HTTP/1.1 parsing and connection management
- Gzip compression via decompress (pure OCaml zlib implementation)

Endpoints: baseline, pipelined, json, compression, upload, db, noisy, mixed

References:
- https://github.com/robur-coop/vif
- https://github.com/robur-coop/httpcats
- Tutorial: https://robur-coop.github.io/vif/
…POST content

The routing DSL has two 'any' values:
- Uri.any: wildcard for query parameters (used with /??)
- Type.any: wildcard for content types (used with post)

Opening Type shadowed Uri.any, causing a type mismatch on query routes.

Fix: remove 'open Type', qualify post content-type as Type.any,
and use nil for routes that don't need query params.
@BennyFranciscus BennyFranciscus requested a review from MDA2AV as a code owner March 23, 2026 23:53
@jerrythetruckdriver
Copy link
Copy Markdown

json_endpoint calls process_items on every request while compression uses the pre-computed large_payload. You already solved this problem 10 lines above — just cache the small dataset result at startup too. Third time I've seen this pattern across your PRs.

@BennyFranciscus
Copy link
Copy Markdown
Collaborator Author

Good catch, yeah that's dumb — the small dataset is static too so there's zero reason to recompute it per request. I'll cache the processed result at startup like large_payload does. Thanks for flagging it (again apparently lol).

json_endpoint was calling process_items on every request while the
dataset is static. Cache the processed result at startup, matching
what large_payload already does for the compression endpoint.

Co-authored-by: jerrythetruckdriver
@BennyFranciscus
Copy link
Copy Markdown
Collaborator Author

Fixed in 0b60e6esmall_payload now caches the processed result at startup, same pattern as large_payload. No more per-request process_items calls on the JSON endpoint. Appreciate the consistency check 🤝

@MDA2AV
Copy link
Copy Markdown
Owner

MDA2AV commented Mar 24, 2026

/benchmark

@github-actions
Copy link
Copy Markdown
Contributor

🚀 Benchmark run triggered for vif (all profiles). Results will be posted here when done.

@github-actions
Copy link
Copy Markdown
Contributor

Benchmark Results

Framework: vif | Profile: all profiles

vif / baseline / 512c (p=1, r=0, cpu=unlimited)
  Best: 3130 req/s (CPU: 563.6%, Mem: 705.0MiB) ===

vif / baseline / 4096c (p=1, r=0, cpu=unlimited)
  Best: 3179 req/s (CPU: 589.1%, Mem: 701.6MiB) ===

vif / baseline / 16384c (p=1, r=0, cpu=unlimited)
  Best: 4585 req/s (CPU: 551.0%, Mem: 709.5MiB) ===

vif / pipelined / 512c (p=16, r=0, cpu=unlimited)
  Best: 25307 req/s (CPU: 787.4%, Mem: 661.3MiB) ===

vif / pipelined / 4096c (p=16, r=0, cpu=unlimited)
  Best: 13944 req/s (CPU: 669.8%, Mem: 664.9MiB) ===

vif / pipelined / 16384c (p=16, r=0, cpu=unlimited)
  Best: 9301 req/s (CPU: 672.5%, Mem: 672.9MiB) ===

vif / limited-conn / 512c (p=1, r=10, cpu=unlimited)
  Best: 3233 req/s (CPU: 631.9%, Mem: 702.5MiB) ===

vif / limited-conn / 4096c (p=1, r=10, cpu=unlimited)
  Best: 2934 req/s (CPU: 581.7%, Mem: 697.6MiB) ===

vif / json / 4096c (p=1, r=0, cpu=unlimited)
  Best: 8143 req/s (CPU: 595.1%, Mem: 660.3MiB) ===

vif / json / 16384c (p=1, r=0, cpu=unlimited)
  Best: 6887 req/s (CPU: 658.6%, Mem: 656.7MiB) ===

vif / upload / 64c (p=1, r=0, cpu=unlimited)
  Best: 0 req/s (CPU: 0%, Mem: 0MiB) ===
Full log
  Threads:   64
  Conns:     4096 (64/thread)
  Pipeline:  1
  Req/conn:  unlimited (keep-alive)
  Expected:  200
  Duration:  5s


  Thread Stats   Avg      p50      p90      p99    p99.9
    Latency   62.21ms   11.20ms   121.20ms   841.40ms    3.33s

  82298 requests in 5.00s, 40717 responses
  Throughput: 8.14K req/s
  Bandwidth:  66.60MB/s
  Status codes: 2xx=40717, 3xx=0, 4xx=0, 5xx=0
  Latency samples: 40716 / 40717 responses (100.0%)
  Reconnects: 40733
  CPU: 595.1% | Mem: 660.3MiB

[run 3/3]
gcannon — io_uring HTTP load generator
  Target:    localhost:8080/json
  Threads:   64
  Conns:     4096 (64/thread)
  Pipeline:  1
  Req/conn:  unlimited (keep-alive)
  Expected:  200
  Duration:  5s


  Thread Stats   Avg      p50      p90      p99    p99.9
    Latency   87.30ms   8.53ms   123.60ms    1.67s    3.33s

  35815 requests in 5.00s, 17738 responses
  Throughput: 3.55K req/s
  Bandwidth:  29.01MB/s
  Status codes: 2xx=17738, 3xx=0, 4xx=0, 5xx=0
  Latency samples: 17738 / 17738 responses (100.0%)
  Reconnects: 17738
  CPU: 355.4% | Mem: 661.3MiB

=== Best: 8143 req/s (CPU: 595.1%, Mem: 660.3MiB) ===
[dry-run] Results not saved (use --save to persist)
httparena-bench-vif
httparena-bench-vif

==============================================
=== vif / json / 16384c (p=1, r=0, cpu=unlimited) ===
==============================================
af1e63a41cf76496bc4ce6ff8c51da3924cdfd7c8b84ec5eee2da16773eff1f0
[wait] Waiting for server...
[ready] Server is up

[run 1/3]
gcannon — io_uring HTTP load generator
  Target:    localhost:8080/json
  Threads:   64
  Conns:     16384 (256/thread)
  Pipeline:  1
  Req/conn:  unlimited (keep-alive)
  Expected:  200
  Duration:  5s


  Thread Stats   Avg      p50      p90      p99    p99.9
    Latency   90.61ms   30.50ms   131.70ms   920.40ms    3.34s

  73468 requests in 5.00s, 34439 responses
  Throughput: 6.88K req/s
  Bandwidth:  56.32MB/s
  Status codes: 2xx=34439, 3xx=0, 4xx=0, 5xx=0
  Latency samples: 34439 / 34439 responses (100.0%)
  Reconnects: 34496
  CPU: 658.6% | Mem: 656.7MiB

[run 2/3]
gcannon — io_uring HTTP load generator
  Target:    localhost:8080/json
  Threads:   64
  Conns:     16384 (256/thread)
  Pipeline:  1
  Req/conn:  unlimited (keep-alive)
  Expected:  200
  Duration:  5s


  Thread Stats   Avg      p50      p90      p99    p99.9
    Latency   191.87ms   119.90ms   327.80ms    1.85s    3.58s

  35209 requests in 5.00s, 15062 responses
  Throughput: 3.01K req/s
  Bandwidth:  24.64MB/s
  Status codes: 2xx=15062, 3xx=0, 4xx=0, 5xx=0
  Latency samples: 15062 / 15062 responses (100.0%)
  Reconnects: 15076
  CPU: 560.0% | Mem: 663.4MiB

[run 3/3]
gcannon — io_uring HTTP load generator
  Target:    localhost:8080/json
  Threads:   64
  Conns:     16384 (256/thread)
  Pipeline:  1
  Req/conn:  unlimited (keep-alive)
  Expected:  200
  Duration:  5s


  Thread Stats   Avg      p50      p90      p99    p99.9
    Latency   159.21ms   114.20ms   321.40ms    1.79s    3.45s

  43614 requests in 5.00s, 19592 responses
  Throughput: 3.92K req/s
  Bandwidth:  32.04MB/s
  Status codes: 2xx=19592, 3xx=0, 4xx=0, 5xx=0
  Latency samples: 19592 / 19592 responses (100.0%)
  Reconnects: 19607
  CPU: 620.6% | Mem: 664.0MiB

=== Best: 6887 req/s (CPU: 658.6%, Mem: 656.7MiB) ===
[dry-run] Results not saved (use --save to persist)
httparena-bench-vif
httparena-bench-vif

==============================================
=== vif / upload / 64c (p=1, r=0, cpu=unlimited) ===
==============================================
0d7d95fa808d112982d2c7c95e76a7a9fa4da0ca046ae4d35b0c5e989b46bca3
[wait] Waiting for server...
[ready] Server is up

[run 1/3]
gcannon — io_uring HTTP load generator
  Target:    localhost:8080/
  Threads:   64
  Conns:     64 (1/thread)
  Pipeline:  1
  Req/conn:  unlimited (keep-alive)
  Expected:  200
  Duration:  5s


  Thread Stats   Avg      p50      p90      p99    p99.9
    Latency    5.03s    5.00s    5.00s    5.00s    5.00s

  8 requests in 10.49s, 4 responses
  Throughput: 0 req/s
  Bandwidth:  40B/s
  Status codes: 2xx=4, 3xx=0, 4xx=0, 5xx=0
  Latency samples: 4 / 4 responses (100.0%)
  Latency overflow (>5s): 4
  Reconnects: 4
  CPU: 1190.7% | Mem: 2.1GiB

[run 2/3]
gcannon — io_uring HTTP load generator
  Target:    localhost:8080/
  Threads:   64
  Conns:     64 (1/thread)
  Pipeline:  1
  Req/conn:  unlimited (keep-alive)
  Expected:  200
  Duration:  5s


  Thread Stats   Avg      p50      p90      p99    p99.9
    Latency      0us      0us      0us      0us      0us

  0 requests in 5.00s, 0 responses
  Throughput: 0 req/s
  Bandwidth:  0B/s
  Status codes: 2xx=0, 3xx=0, 4xx=0, 5xx=0
  Latency samples: 0 / 0 responses (0.0%)
  CPU: 1180.0% | Mem: 2.8GiB

[run 3/3]
gcannon — io_uring HTTP load generator
  Target:    localhost:8080/
  Threads:   64
  Conns:     64 (1/thread)
  Pipeline:  1
  Req/conn:  unlimited (keep-alive)
  Expected:  200
  Duration:  5s


  Thread Stats   Avg      p50      p90      p99    p99.9
    Latency      0us      0us      0us      0us      0us

  0 requests in 5.00s, 0 responses
  Throughput: 0 req/s
  Bandwidth:  0B/s
  Status codes: 2xx=0, 3xx=0, 4xx=0, 5xx=0
  Latency samples: 0 / 0 responses (0.0%)
  CPU: 1031.7% | Mem: 3.8GiB

=== Best: 0 req/s (CPU: 0%, Mem: 0MiB) ===
httparena-bench-vif
httparena-bench-vif
[restore] Restoring CPU governor to powersave...

@BennyFranciscus
Copy link
Copy Markdown
Collaborator Author

Nice, thanks for running the benchmarks!

Results are roughly where I expected for a first OCaml 5 entry. The pipelined numbers are the standout — 25K req/s at 512c shows Miou's multicore scheduler handles connection multiplexing well. The drop-off to 9.3K at 16384c is steep though, might be worth looking at how httpcats handles fd pressure at that scale.

Baseline sits around 3-4.5K which is modest, but honestly not bad for a beta framework that's doing OCaml effects-based scheduling under the hood.

The upload endpoint is broken though — 0 req/s, memory ballooned to 3.8GiB and it just stopped responding after a handful of requests. Looking at the logs it seems like the streaming body read is accumulating in memory instead of being processed incrementally. I'll dig into the vif upload handler — probably need to handle the body chunks differently with the Miou scheduler to avoid buffering everything.

Will push a fix for upload and we can re-run.

@MDA2AV
Copy link
Copy Markdown
Owner

MDA2AV commented Mar 24, 2026

@BennyFranciscus cpu usage is very low, are we doing async or parallel properly?

- Set domains = Domain.recommended_domain_count() - 1 in Vif.config
  (was defaulting to 1 domain = single core, explaining ~550% CPU
  instead of ~8000% on the benchmark machine)
- Replace Flux.Stream-based body reading with direct Source
  pull loop to fix upload OOM (memory was ballooning to 3.8GiB
  then timing out at 0 req/s)
@BennyFranciscus
Copy link
Copy Markdown
Collaborator Author

Good catch — the low CPU usage was because I wasn't setting the domains parameter in Vif.config, so it was defaulting to 1 domain (= single core). That's why we were seeing ~550% CPU instead of saturating all cores.

Fixed in dce05bf: now uses Domain.recommended_domain_count() - 1 which on the benchmark machine should give us all available cores working in parallel via Miou's multicore scheduler.

Also fixed the upload endpoint in the same commit — replaced the Flux.Stream-based body read with a direct source pull loop. The stream approach was accumulating the entire body in memory before processing, which caused the OOM + timeout at 0 req/s. The new version counts bytes incrementally as chunks arrive.

Should be ready for a re-benchmark whenever — expecting a significant jump in the numbers with proper parallelism. 🤞

@BennyFranciscus
Copy link
Copy Markdown
Collaborator Author

Oops — the upload streaming fix I pushed in dce05bf had a typo: Flux.Source.Source instead of Flux.Source. Fixed in c91f24d, CI should be green now.

Still waiting on the re-benchmark with multicore domains enabled — the first run was only using 1 core (~550% CPU). With the ~domains fix it should saturate all available cores.

@BennyFranciscus
Copy link
Copy Markdown
Collaborator Author

CI is green now ✅ — both detect and validate passing. The benchmark at 09:56 ran on the old code (single-core default, broken upload). Since then:

  • dce05bf: Added ~domains for multicore parallelism (was defaulting to 1 domain)
  • c91f24d: Fixed typo in upload streaming fix
  • 0b60e6e: Cached small_payload at startup (jerry's catch)

Ready for a re-benchmark whenever — expecting a solid jump in throughput with all cores active. 🤞

@MDA2AV
Copy link
Copy Markdown
Owner

MDA2AV commented Mar 24, 2026

/benchmark

@github-actions
Copy link
Copy Markdown
Contributor

🚀 Benchmark run triggered for vif (all profiles). Results will be posted here when done.

@github-actions
Copy link
Copy Markdown
Contributor

Benchmark Results

Framework: vif | Profile: all profiles

vif / baseline / 512c (p=1, r=0, cpu=unlimited)
  Best: 7347 req/s (CPU: 10356.3%, Mem: 1.0GiB) ===

vif / baseline / 4096c (p=1, r=0, cpu=unlimited)
  Best: 7310 req/s (CPU: 10123.1%, Mem: 1.0GiB) ===

vif / baseline / 16384c (p=1, r=0, cpu=unlimited)
  Best: 8868 req/s (CPU: 7199.1%, Mem: 1.1GiB) ===

vif / pipelined / 512c (p=16, r=0, cpu=unlimited)
  Best: 58023 req/s (CPU: 10123.4%, Mem: 1.0GiB) ===

vif / pipelined / 4096c (p=16, r=0, cpu=unlimited)
  Best: 105631 req/s (CPU: 8975.8%, Mem: 1.0GiB) ===

vif / pipelined / 16384c (p=16, r=0, cpu=unlimited)
  Best: 69750 req/s (CPU: 7612.7%, Mem: 1.1GiB) ===

vif / limited-conn / 512c (p=1, r=10, cpu=unlimited)
  Best: 7149 req/s (CPU: 10299.7%, Mem: 1.0GiB) ===

vif / limited-conn / 4096c (p=1, r=10, cpu=unlimited)
  Best: 6983 req/s (CPU: 9352.0%, Mem: 1022.0MiB) ===

vif / json / 4096c (p=1, r=0, cpu=unlimited)
  Best: 106988 req/s (CPU: 8886.9%, Mem: 1.1GiB) ===

vif / json / 16384c (p=1, r=0, cpu=unlimited)
  Best: 72501 req/s (CPU: 7416.2%, Mem: 1.1GiB) ===

vif / upload / 64c (p=1, r=0, cpu=unlimited)
  Best: 6 req/s (CPU: 3517.5%, Mem: 3.0GiB) ===

vif / upload / 256c (p=1, r=0, cpu=unlimited)
  Best: 0 req/s (CPU: 0%, Mem: 0MiB) ===
Full log
  Duration:  5s


  Thread Stats   Avg      p50      p90      p99    p99.9
    Latency   133.17ms   112.80ms   191.90ms   576.00ms    1.60s

  640368 requests in 5.02s, 315738 responses
  Throughput: 62.84K req/s
  Bandwidth:  514.51MB/s
  Status codes: 2xx=315738, 3xx=0, 4xx=0, 5xx=0
  Latency samples: 315738 / 315738 responses (100.0%)
  Reconnects: 315932
  Errors: connect 0, read 3, timeout 0
  CPU: 7904.5% | Mem: 1.1GiB

[run 3/3]
gcannon — io_uring HTTP load generator
  Target:    localhost:8080/json
  Threads:   64
  Conns:     16384 (256/thread)
  Pipeline:  1
  Req/conn:  unlimited (keep-alive)
  Expected:  200
  Duration:  5s


  Thread Stats   Avg      p50      p90      p99    p99.9
    Latency   114.43ms   101.70ms   158.90ms   425.70ms   952.60ms

  736425 requests in 5.02s, 363958 responses
  Throughput: 72.44K req/s
  Bandwidth:  593.08MB/s
  Status codes: 2xx=363957, 3xx=0, 4xx=0, 5xx=1
  Latency samples: 363958 / 363958 responses (100.0%)
  Reconnects: 364188

  WARNING: 1/363958 responses (0.0%) had unexpected status (expected 2xx)
  CPU: 7416.2% | Mem: 1.1GiB

=== Best: 72501 req/s (CPU: 7416.2%, Mem: 1.1GiB) ===
[dry-run] Results not saved (use --save to persist)
httparena-bench-vif
httparena-bench-vif

==============================================
=== vif / upload / 64c (p=1, r=0, cpu=unlimited) ===
==============================================
e398da52579e0faf3b3212ebab7c76b400218db5320a6b49e0aba82ee295c18c
[wait] Waiting for server...
[ready] Server is up

[run 1/3]
gcannon — io_uring HTTP load generator
  Target:    localhost:8080/
  Threads:   64
  Conns:     64 (1/thread)
  Pipeline:  1
  Req/conn:  unlimited (keep-alive)
  Expected:  200
  Duration:  5s


  Thread Stats   Avg      p50      p90      p99    p99.9
    Latency    3.62s    3.62s    3.73s    4.15s    4.15s

  82 requests in 9.20s, 41 responses
  Throughput: 4 req/s
  Bandwidth:  464B/s
  Status codes: 2xx=41, 3xx=0, 4xx=0, 5xx=0
  Latency samples: 41 / 41 responses (100.0%)
  Reconnects: 41
  Errors: connect 0, read 12, timeout 0
  CPU: 3110.8% | Mem: 2.3GiB

[run 2/3]
gcannon — io_uring HTTP load generator
  Target:    localhost:8080/
  Threads:   64
  Conns:     64 (1/thread)
  Pipeline:  1
  Req/conn:  unlimited (keep-alive)
  Expected:  200
  Duration:  5s


  Thread Stats   Avg      p50      p90      p99    p99.9
    Latency    2.86s    2.78s    3.39s    4.79s    4.79s

  110 requests in 10.01s, 55 responses
  Throughput: 5 req/s
  Bandwidth:  572B/s
  Status codes: 2xx=55, 3xx=0, 4xx=0, 5xx=0
  Latency samples: 55 / 55 responses (100.0%)
  Reconnects: 55
  Errors: connect 0, read 6, timeout 0
  CPU: 3006.5% | Mem: 2.6GiB

[run 3/3]
gcannon — io_uring HTTP load generator
  Target:    localhost:8080/
  Threads:   64
  Conns:     64 (1/thread)
  Pipeline:  1
  Req/conn:  unlimited (keep-alive)
  Expected:  200
  Duration:  5s


  Thread Stats   Avg      p50      p90      p99    p99.9
    Latency    2.54s    2.63s    2.97s    3.41s    3.41s

  126 requests in 10.00s, 63 responses
  Throughput: 6 req/s
  Bandwidth:  655B/s
  Status codes: 2xx=63, 3xx=0, 4xx=0, 5xx=0
  Latency samples: 63 / 63 responses (100.0%)
  Reconnects: 63
  Errors: connect 0, read 10, timeout 0
  CPU: 3517.5% | Mem: 3.0GiB

=== Best: 6 req/s (CPU: 3517.5%, Mem: 3.0GiB) ===
  Input BW: 120.00MB/s (avg template: 20971593 bytes)
[dry-run] Results not saved (use --save to persist)
httparena-bench-vif
httparena-bench-vif

==============================================
=== vif / upload / 256c (p=1, r=0, cpu=unlimited) ===
==============================================
90281db0dafe91139adc53d4fb9bf49124b5b10077d2af50ae7388e4250ea9ad
[wait] Waiting for server...
[ready] Server is up

[run 1/3]
gcannon — io_uring HTTP load generator
  Target:    localhost:8080/
  Threads:   64
  Conns:     256 (4/thread)
  Pipeline:  1
  Req/conn:  unlimited (keep-alive)
  Expected:  200
  Duration:  5s


  Thread Stats   Avg      p50      p90      p99    p99.9
    Latency      0us      0us      0us      0us      0us

  0 requests in 5.00s, 0 responses
  Throughput: 0 req/s
  Bandwidth:  0B/s
  Status codes: 2xx=0, 3xx=0, 4xx=0, 5xx=0
  Latency samples: 0 / 0 responses (0.0%)
  CPU: 7472.1% | Mem: 4.0GiB

[run 2/3]
gcannon — io_uring HTTP load generator
  Target:    localhost:8080/
  Threads:   64
  Conns:     256 (4/thread)
  Pipeline:  1
  Req/conn:  unlimited (keep-alive)
  Expected:  200
  Duration:  5s


  Thread Stats   Avg      p50      p90      p99    p99.9
    Latency      0us      0us      0us      0us      0us

  0 requests in 5.00s, 0 responses
  Throughput: 0 req/s
  Bandwidth:  0B/s
  Status codes: 2xx=0, 3xx=0, 4xx=0, 5xx=0
  Latency samples: 0 / 0 responses (0.0%)
  CPU: 7754.5% | Mem: 7.2GiB

[run 3/3]
gcannon — io_uring HTTP load generator
  Target:    localhost:8080/
  Threads:   64
  Conns:     256 (4/thread)
  Pipeline:  1
  Req/conn:  unlimited (keep-alive)
  Expected:  200
  Duration:  5s


  Thread Stats   Avg      p50      p90      p99    p99.9
    Latency      0us      0us      0us      0us      0us

  0 requests in 5.00s, 0 responses
  Throughput: 0 req/s
  Bandwidth:  0B/s
  Status codes: 2xx=0, 3xx=0, 4xx=0, 5xx=0
  Latency samples: 0 / 0 responses (0.0%)
  CPU: 7980.0% | Mem: 11.9GiB

=== Best: 0 req/s (CPU: 0%, Mem: 0MiB) ===
httparena-bench-vif
httparena-bench-vif
[restore] Restoring CPU governor to performance...

@BennyFranciscus
Copy link
Copy Markdown
Collaborator Author

Benchmark Results Analysis

Thanks for the re-benchmark! The non-upload numbers with multicore enabled look solid:

  • Pipelined 4096c: 105,631 req/s ⚡ (using ~9000% CPU = all cores working)
  • JSON 4096c: 106,988 req/s
  • Baseline 16384c: 8,868 req/s (non-pipelined)

The upload profile is the problem — 6 req/s at 64c, 0 req/s at 256c with memory ballooning to 11.9GB.

Root cause

Our count_body_bytes handler streams correctly (pulls chunks incrementally, never accumulates), but vif appears to buffer the entire request body in memory before making it available through the Flux.Source API. With 20MB uploads:

  • 64 concurrent = ~1.3GB minimum just for bodies → 3GB actual (GC overhead)
  • 256 concurrent = ~5GB minimum → 11.9GB actual → stalls/OOM

This is a framework-level behavior — there's no application-level workaround. It would need changes in vif's HTTP parser to expose a true streaming body API that yields chunks as they arrive from the socket.

@dinosaure — is there a way to configure vif to not buffer the full request body? Or would that need a change in the framework itself? The upload benchmark sends 20MB POST bodies at high concurrency, so streaming from the socket is pretty important here.

Other notes

  • Small 1/363958 responses (0.0%) had unexpected status in json 16384c — probably just connection pressure, not a real issue
  • The 10000%+ CPU utilization confirms multicore is working properly now 🎉

@dinosaure
Copy link
Copy Markdown

Firstly, the recent benchmarks carried out with Vif are based on a pre-release version that I can release in the coming days if necessary.

Furthermore, these are the httpcats and vif pins on these versions:

$ opam pin add httpcats git+https://github.com/robur-coop/httpcats.git#0883aebb68c6bc58eb36c787e1a54090f508ae29
$ opam pin add vif https://github.com/robur-coop/vif.git#8c6ac3fb97cb9a31bf6ad8ec63c942336dcfd03e

Next, from what I can see, count_body_bytes is correct (although we could use Flux.Sink) and vif streams the request content correctly (see this internal function); I would need a reproducible test to understand exactly what is happening.

@BennyFranciscus
Copy link
Copy Markdown
Collaborator Author

Thanks for looking into this @dinosaure! You're right — looking at to_source with schedule_read, the Flux.Source API does stream chunks via the callback. And our count_body_bytes just pulls chunks and sums lengths, never accumulating.

So the bottleneck might be elsewhere. A couple of theories:

  1. h1/h2 body reader buffering — does the underlying H1.Body.Reader or the TCP receive path buffer the full body before the first schedule_read fires?
  2. Miou scheduler overhead under high concurrency — 256 concurrent 20MB uploads might be creating too many fibers/tasks

Reproducing

The benchmark tool sends concurrent 20MB POST requests to /upload. You can approximate it with:

# Generate a 20MB file
dd if=/dev/urandom of=/tmp/testfile bs=1M count=20

# Low concurrency (should work fine)
wrk -t4 -c64 -d10s -s post.lua http://localhost:8080/upload
# post.lua: wrk.method = "POST"; wrk.body = io.open("/tmp/testfile"):read("*a")

# High concurrency (this is where it breaks)
wrk -t4 -c256 -d10s -s post.lua http://localhost:8080/upload

At 64c we saw ~6 req/s with memory climbing to 3GB. At 256c it hit 11.9GB and effectively stalled (0 req/s). For reference, other frameworks doing the same streaming-count pattern handle 256c at 300-500+ req/s with stable memory.

If the pre-release versions you mentioned have changes that could help here, happy to pin to those and re-test!

@MDA2AV
Copy link
Copy Markdown
Owner

MDA2AV commented Mar 24, 2026

@BennyFranciscus I'll try fix this locally, stay standby on this PR

@dinosaure
Copy link
Copy Markdown

With the upstream versions (and 8 cores), I get these results and can confirm (via the logs) that vif does not buffer all the content:

> taskset -c 16-27 wrk -t4 -c256 -d10s -s post.lua http://localhost:8080/upload
Running 10s test @ http://localhost:8080/upload
  4 threads and 256 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     0.00us    0.00us   0.00us    -nan%
    Req/Sec    17.13     17.48    90.00     84.16%
  248 requests in 10.09s, 20.59KB read
  Socket errors: connect 0, read 0, write 0, timeout 248
Requests/sec:     24.58
Transfer/sec:      2.04KB
> taskset -c 16-27 wrk -t4 -c64 -d10s -s post.lua http://localhost:8080/upload
Running 10s test @ http://localhost:8080/upload
  4 threads and 64 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     1.17s   322.22ms   1.80s    61.09%
    Req/Sec    15.99     11.01    50.00     74.82%
  514 requests in 10.09s, 42.67KB read
Requests/sec:     50.94
Transfer/sec:      4.23KB

You can change the size of the buffers used via Vif.config and its ~config option. In particular, you can tweak the settings for HTTP/1.1 using H1.Config.t. It is still preferable to use the upstream versions.

@BennyFranciscus
Copy link
Copy Markdown
Collaborator Author

Thanks for testing this on your end @dinosaure — really helpful to have the framework author's perspective here!

So if vif isn't buffering, that narrows it down. Your numbers actually confirm what we're seeing though — even on your machine with 8 cores:

  • 256c: ~25 req/s with all 248 requests timing out
  • 64c: ~51 req/s with 1.17s average latency

For 20MB uploads those numbers are in the same ballpark as our benchmark results, so at least we're seeing the same behavior.

Good call on the H1.Config.t buffer tuning — I'll look into adjusting the buffer sizes in our implementation. If larger read buffers reduce the number of schedule_read calls per request, that could help with throughput.

@MDA2AV heads up — dinosaure suggests pinning to the upstream versions and tweaking Vif.config ~config with H1 buffer settings. Might be worth trying before the next benchmark run.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants